INDREX: In-database relation extraction
نویسندگان
چکیده
The management of text data has a long-standing history in the human mankind. A particular common task is extracting relations from text. Typically, the user performs this task with two separate systems, a relation extraction system and an SQL-based query engine for analytical tasks. During this iterative analytical workflow, the user must frequently ship data between these systems. Worse, the user must learn to manage both systems. Therefore, end users often desire a single system for both analytical and relation extraction tasks. We propose INDREX, a system that provides a single and comprehensive view of the whole process combining both relation extraction and later exploitationwith SQL. The system permits a data warehouse style extract-transform-load of generic relations extracted from text documents and can support additional text mining analysis libraries or systems. Once generic relations are loaded, the user can define SQL queries on the extracted relations to discover higher level semantics or to join them with other relational data. For executing this powerful task, our system extends the SQL-based analytical capabilities of a columnar-based massively parallel query processing engine with a broad set of userdefined functions and a data model that supports this task. Our white-box approach permits INDREX to benefit from built-in query optimization and indexing techniques of the underlaying query execution engine. Applications that support both text mining and analytical workflows leverage new analytical platforms based on the MapReduce framework and its open source Hadoop implementation. We compare our system against this base line. We measure execution times for common workflows and demonstrate orders of magnitude improvement in execution time using INDREX. & 2014 Elsevier Ltd. All rights reserved.
منابع مشابه
Interactive Relation Extraction in Main Memory Database Systems
We present INDREX-MM, a main memory database system for interactively executing two interwoven tasks, declarative relation extraction from text and their exploitation with SQL. INDREXMM simplifies these tasks for the user with powerful SQL extensions for gathering statistical semantics, for executing open information extraction and for integrating relation candidates with domain specific data. ...
متن کاملSemantic Relation Extraction from a Cultural Database
Semantic relation extraction aims to extract relation instances from natural language texts. In this paper, we propose a semantic relation extraction approach based on simple relation templates that determine relation types and their arguments. We attempt to reduce semantic drift of the arguments by using named entity models as semantic constraints. Experimental results indicate that our approa...
متن کاملExploring a Few Good Tuples From a Text Database
Information extraction from text databases is a useful paradigm to populate relational tables and unlock the considerable value hidden in plain-text documents. However, information extraction can be expensive, due to various complex text processing steps necessary in uncovering the hidden data. There are a large number of text databases available, and not every text database is necessarily rele...
متن کاملChemical-induced disease relation extraction via convolutional neural network
This article describes our work on the BioCreative-V chemical-disease relation (CDR) extraction task, which employed a maximum entropy (ME) model and a convolutional neural network model for relation extraction at inter- and intra-sentence level, respectively. In our work, relation extraction between entity concepts in documents was simplified to relation extraction between entity mentions. We ...
متن کاملReview of Relation Extraction Methods: What Is New Out There?
Relation extraction is a part of Information Extraction and an established task in Natural Language Processing. This paper presents an overview of the main directions of research and recent advances in the field. It reviews various techniques used for relation extraction including knowledge-based, supervised and self-supervised methods. We also mention applications of relation extraction and id...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Inf. Syst.
دوره 53 شماره
صفحات -
تاریخ انتشار 2015